Combining Machine Learning with Dictionary Lookup for Chemical Compound and Drug Name Recognition Task

نویسندگان

  • Lishuang Li
  • Rui Guo
  • Shanshan Liu
  • Panpan Zhang
  • Tianfu Zheng
  • Degen Huang
  • Huiwei Zhou
چکیده

Following the interest taken into Name Entity Recognition in academic literature in the Gene Mention recognition task of BioCreative I and II, the BioCreative IV hopes to make the implementation of the system in the field of detecting mentions of chemical compounds and drugs. Considering that the machine learning methods have obtained great success in the correct identification of gene and protein names, and dictionary lookups also have the power to recognize the variable naming convention of chemical and drug names, we combine the above approaches by regarding dictionary results as features to help machine learning. Our system is based on Conditional Random Fields (CRF).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chemical name recognition with harmonized feature-rich conditional random fields

This article presents a machine learning-based solution for automatic chemical and drug name recognition on scientific documents, which was applied in the BioCreative IV CHEMDNER task, namely in the chemical entity mention recognition (CEM) and the chemical document indexing (CDI) sub-tasks. The proposed approach applies conditional random fields with a rich feature set, including linguistic, o...

متن کامل

DBCHEM: A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task

We propose a method, named DBCHEM, based on database queries for the chemical compound and drug name recognition task of the BioCreative IV challenge. We prepared a database with 145 million entries containing compound and drug names, their synonyms, and molecular formulas. PubChem Power User Gateway (PUG) system is used to construct the database. Candidate chemical and drug names are identifie...

متن کامل

Patent mining: combining dictionary-based and machine-learning approaches

Exploration of the chemical patent space is essential for early-stage medicinal chemistry activities. The BioCreative CHEMDNER-patents task focuses on the recognition of chemical compounds in patents. This includes recognition of chemical named entities in patents (CEMP), classification of chemical-related patent titles and abstracts (CPD), and recognition of genes and proteins in patent abstra...

متن کامل

Exploring Extensions to Machine-learning based Gene Normalisation

One of the foundational text-mining tasks in the biomedical domain is the identification of genes and protein names in journal papers. However, the ambiguous nature of gene names means that the performance of information management tasks such as query-based retrieval will suffer if gene name mentions are not explicitly mapped back to a unique identifier in order to resolve issues relating to sy...

متن کامل

Recognition of chemical entities: combining dictionary-based and grammar-based approaches

BACKGROUND The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013